Take home messages

Precision, imprecision, higher-order probabilities

Higher order probabilities (HOP) deal better with problems faced by precise and imprecise probabilism (PP/ IP).

Weight of evidence

HOPs allow for a principled, information-theoretic account of weight of evidence, which works better than previous proposals within PP and IP.

Reasoning

The approach is computationally feasible and can be implemented with Bayesian Networks.

The plan

  • Introducing imprecise probabilism

  • Challenges to imprecise probabilism

  • Higher-order probabilities

  • Explications of weight of evidence so far

  • Information theory and weight

Introducing imprecise probabilism

Precise probabilism (PP)

A rational agent’s (RA) degrees of belief are to be represented by means of a single probability measure defined over every proposition she entertains

Example: fair coin

\[ \mathsf{P}(H) = \mathsf{P}(\neg H)=.5\]

Example: Unknown bias (PIE?)

\[\mathsf{P}(H) = \mathsf{P}(\neg H)=.5 \]

Imprecision and evidence responsiveness

Locality

RA’s credal stance (in a wide, non-technical sense) about a proposition is to be captured by whatever probability (or probabilities) she assigns to it, and does not depend on what probabilities RA assigns to logically independent propositions

Trouble with evidence responsiveness

PP can’t distinguish between (Fair coin) and (Unknown bias) and multiple other cases

Trouble with sweetening

If RA doesn’t know what the bias of the coin is, learning that it now has increased by .001, might still leave RA undecided

Independence preservation fails for linear pooling

A bunch of other related limitative results (Dietrich & List, 2016)

Imprecise probabilism (IP)

Representors

RA’s credal stance towards \(H\) is to be represented by means of a set of those probability measures \(\mathbb{P}\), which are compatible with evidence.

\[ \mathbb{P}_{t_1} = \{\mathsf{P}_{t_1}\vert \exists\, {\mathsf{P}_{t_0} \!\in \mathbb{P}_{t_0}}\,\, \forall\, {H}\,\, \left[\mathsf{P}_{t_1}(H)=\mathsf{P}_{t_0}(H \vert E)\right] \}. \]

(Bradley, 2019; Fraassen, 2006; Gärdenfors & Sahlin, 1982; Joyce, 2005; Kaplan, 1968; Keynes, 1921; Levi, 1974; Sturgeon, 2008; Walley, 1991)

Evidence responsiveness

  • Fair coin: as PP

  • Unknown bias: all possible probability measures

Indifference and indecision

  • Indifference: \(A\) and \(B\) are equally likely

  • Indecision: super-valuationist comparison yields no such determination

(Kaplan, 1968)

No pooling in aggregation

Bundle them up, win the war of indpendence!

Challenges to imprecise probabilism

More evidence responsiveness

Two biases

The bias is either .4 or .6

\(\mathsf{P}_1, \mathsf{P}_2\) such that \(\mathsf{P}_1(H)=.4\) and \(\mathsf{P}_2(H)=.6\)?

Two unbalanced biases?

The bias is either .4 or .6, and .4 is three times more likely than .6

Comparison revisited: Rinard’s mystery urns

  • GREEN contains only green marbles

  • no information about MYSTERY

Intuitions here

  • RA should be certain that the marble drawn from GREEN will be green (\(G\)),

  • RA should be more confident about \(G\) than that the marble from MYSTERY will be green (\(M\))

The trouble for IP

  • For each \(r\in [0,1]\), RA’s representor contains a \(\mathsf{P}\) with \(\mathsf{P}(M)=r\)

  • Including the one with \(\mathsf{P}(M)=1\)

  • So it is not the case that for any of RA’s representor \(\mathsf{P}\), \(\mathsf{P}(G) > \mathsf{P}(M)\)

  • So—on IP—RA does not prefer \(G\) over \(M\)

Belief inertia

Trouble for IP

(Levi, 1980)

Belief inertia

Trouble for IP

(Levi, 1980)

Belief inertia

Trouble for IP

(Levi, 1980)

General problems with IP

Proper scoring rule is impossible

See results by Seidenfeld 2012, Mayo-Wilson 2016, Schoenfield 2017, Cambell-Moore 2020

Aggregating doesn’t fly far

  • Taking unions leads to skepticism. What else?

  • Can’t model synergy

Unclear mechanism of evidential constraints

  • “Drop measures excluded by the evidence”. But how (other than degenerate cases)?

  • How exactly does non-testimonial evidence of chances \(\{ \mathsf{P}(X) = x\}\) or \(\mathsf{P}(X) \in [x,y]\) is supposed to arise?

Higher-order probabilities

Second-order approach to uncertainty

Key ideas

  • Uncertainty is not a single-dimensional thing to be mapped on a single one-dimensional scale such as a real line.

  • It is the whole shape of the whole distribution over parameter values that should be taken under consideration.

  • Summaries are just that.

Some simple examples

Dealing with problems for IP

  • Much more evidence sensitive (and can go higher order if needed)

  • Seamless integration with Bayesian statistics explains learning

  • Belief inertia does not arise

  • HPDI comparison avoids Rinard’s objection

  • There is a proper scoring rule (Urbaniak 2022)

  • Evidence-based aggregation is accuracy-wise better than averaging and can model synergy

HOP and BNs

HOP & BNs

Explications of weight so far

Balance vs. weight

Precursors

  • Beans from a bag, two colors, same observed proportion, different sample sizes (C. S. Peirce, 1872).

  • The notion of weight: balance might remain the same while the amount of relevant evidence shifts (Keynes 1921).

Desiderata

  • Balance undetermination Different weights with the same balance are possible.

  • Weak (strong) increase In Bernoulli trials, weight does not decrease (increases) with sample size keeping frequency fixed.

  • Frequency monotonicity In Bernoulli trials, keeping sample size fixed, weight does not decrease as observed frequency goes further from .5.

No unrestricted monotonicity

Weatherson, Joyce, Runde

  • Straight flush has the probability of \(\frac{40}{2,958,960}\).

  • The player starts behaving confusingly and bluffing.

Weight and precise probablism

Hamer’s certainty

Hamer’s absolute distance from 1 or 0 depending on the balance fails at Balance undetermination.

Good’s desiderata and weight

  • \(W(H:E)\) is some function of \(\mathsf{P}(E\vert H), \mathsf{P}(E\vert \neg H)\)

  • \(\mathsf{P}(H \vert E) = g[W(H:e), \mathsf{P}(H)]\)

  • \(W(H: E_1 \wedge E_2) = W(H:E_1) + W(H:E_2 \vert E_1)\)

\[W(H:E) = \log \frac{\mathsf{P}(E \vert H)}{\mathsf{P}(E\vert \neg H)}\]

Good’s weight fails at weak increase

Nine fair dice, one with bias \(\frac{1}{3}\) for six.

Intervals

Kyburg’s Evidential Probability

\(\mathsf{EP}(H \vert E \wedge K) = [x,y]\)

  • Sharpening by richness (prefer frequencies from full joint distributions)

  • Sharpening by specificity (prefer proper subsets)

  • Sharpening by precision (pick single subinterval if it exists, otherwise, shortest possible cover of minimal subintervals)

Pedden’s weight

Let \(\mathsf{EP}(H \vert E \wedge K) = [x,y]\), then:

\(\mathsf{WK(H\vert E\wedge K)} = 1 - (y-x)\).

Troubles with EP

  • Pedden picks edges by error margins (sensitivity!)

  • Also, sensitivity to what happens around the edges only.

  • How to deploy outside of combinatorial or frequentist contexts?

  • Reasoning with intervals hard to model sensibly (does not preserve structural information)

Joyce’s weight

\(w(X,E) = \sum_x \vert c(ch(X) = x \vert E) \times (x - c(X\vert E))^2 - c(ch(X) = x) \times (x - c(X))^2\vert\)

Notice

  • no use of representors

  • drop of (Locality)

Information theory and weight

Information theory crash-course

\(m=8\) possible destinations can be reached by making decisions at \(\log_2(8)=3\) forks

Suprise, information, entropy

\(1/\mathsf{P}(x)\)

\(\log_2(\mathsf{surprise}) = - \log_2\mathsf{P}(x)\)

\(H(X) = \sum \mathsf{P}(x_i) \log_2 \frac{1}{\mathsf{P}(x_i)} = - \sum \mathsf{P}(x_i) \log_2 \mathsf{P}(x_i)\)

(the expected amount of information you receive once you learn what the value of \(X\) is)

Weight of a distribution

Key idea

The more informative a piece of evidence is, as compared to the uniform distribution, the more weight it has, on scale 0 to 1

\[\mathsf{w(P_i)} = 1 - \left( \frac{H(\mathsf{P})}{H(\mathsf{uniform})}\right)\]

Weight of a distribution

Weak increase holds

Works for a variety of shapes

Abuse and rocking example

Back to the Sally Clark BN

Expected weight

Wrapping up

The higher-order approach

  • Leads to more honesty in uncertainty assessment
  • Is more sensible than sensitivity analysis
  • Integrates with Bayesian data analysis
  • Leads to an information-theoretic account of evidential weight
  • Is computationally feasible

Other things I wish I had time to discuss

  • connections with precise vs. imprecise probabilism in formal epistemology

  • problems with existing opinion aggregation methods and a higher-order approach

  • modeling synergy of multiple sources of information

  • further properties of weight

  • relation to evidential completeness

Thank you!

References

Bradley, S. (2019). Imprecise Probabilities. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy ( Spring 2019). https://plato.stanford.edu/archives/spr2019/entries/imprecise-probabilities/; Metaphysics Research Lab, Stanford University.

Dietrich, F., & List, C. (2016). Probabilistic opinion pooling. In A. Hajek & C. Hitchcock (Eds.), Oxford handbook of philosophy and probability. Oxford: Oxford University Press.

Fraassen, B. C. V. (2006). Vague expectation value loss. Philosophical Studies, 127(3), 483–491. https://doi.org/10.1007/s11098-004-7821-2

Gärdenfors, P., & Sahlin, N.-E. (1982). Unreliable probabilities, risk taking, and decision making. Synthese, 53(3), 361–386. https://doi.org/10.1007/bf00486156

Joyce, J. M. (2005). How probabilities reflect evidence. Philosophical Perspectives, 19(1), 153–178.

Kaplan, J. (1968). Decision theory and the fact-finding process. Stanford Law Review, 20(6), 1065–1092.

Keynes, J. M. (1921). A treatise on probability, 1921. London: Macmillan.

Levi, I. (1974). On indeterminate probabilities. The Journal of Philosophy, 71(13), 391. https://doi.org/10.2307/2025161

Levi, I. (1980). The enterprise of knowledge: An essay on knowledge, credal probability, and chance. MIT Press.

Sturgeon, S. (2008). Reason and the grain of belief. No ûs, 42(1), 139–165. Retrieved from http://www.jstor.org/stable/25177157

Walley, P. (1991). Statistical reasoning with imprecise probabilities. Chapman; Hall London.